feat: add evaluation-level expectedOutput to EvaluationItem#1387
Merged
feat: add evaluation-level expectedOutput to EvaluationItem#1387
Conversation
Add an optional `expectedOutput` field at the evaluation level so output-based evaluators can share a common expected output instead of duplicating it in every evaluator's criteria entry. Resolution order: 1. Per-evaluator criteria expectedOutput (highest priority) 2. Evaluation-level expectedOutput (fallback) 3. Evaluator config default / error (existing behavior) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6b5e80e to
f73fda2
Compare
andrei-rusu
approved these changes
Feb 27, 2026
Add evaluation set JSON files exercising the new expectedOutput field on EvaluationItem and a matching testcase with run.sh + assert.py that validates scores from deterministic and LLM judge evaluators. Bump version to 2.11.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
uipath-langchain==0.7.11 pins uipath>=2.10.0,<2.11.0, so bumping to 2.11.0 breaks the cross-compatibility testcase. The version bump should be coordinated with a uipath-langchain release. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an optional
expectedOutputfield at the evaluation item level in the v1.0 schema. Output-based evaluators (ExactMatch, JsonSimilarity, LLMJudgeOutput) can now read expected output from the evaluation itself, instead of requiring it to be duplicated in every evaluator'sevaluationCriteriasentry.Why
Today, when three evaluators need the same expected output, you write it three times:
With this change you write it once:
Per-evaluator override still works — if a criteria entry has its own
expectedOutput, it wins.How it works
Model — One new optional field on
EvaluationItem:Runtime — 15 lines of merge logic in
_execute_eval()before criteria is passed to evaluators:OutputEvaluationCriteriaand the evaluation hasexpectedOutput:{"expectedOutput": eval_item.expected_output}expectedOutput→ merge it inexpectedOutput→ keep as-is (per-evaluator wins)Backward compatibility
None— existing eval-set JSONs parse and run without any changes.Tests
25 new tests across 7 test classes covering:
All 1574 tests pass (25 new + 1549 existing, zero regressions).
Jira
AE-1066
Spec
Confluence: Evaluation-Level ExpectedOutput Schema Enhancement
🤖 Generated with Claude Code